This is a brief exploratory data analysis of AirBnB listings vs Zillow listings for Washington, DC, using web-scraped data from roughly June of 2024. There are no real interesting observations. I was trying out R (this is my first ever R Markdown) as part of the requirements for the final project of cs50’s Introduction to Programming with R.
I’m playing with R visualizations.
Loading the Zillow data:
## # A tibble: 6 × 13
## zpid homeStatus marketingStatus price latitude longitude beds baths area
## <chr> <chr> <chr> <int> <dbl> <dbl> <int> <int> <int>
## 1 121035… ComingSoon Coming Soon 4.85e5 38.9 -77.1 2 2 1086
## 2 425748 ForSale For Sale by Ag… 1.15e6 38.9 -77.0 3 4 2540
## 3 435451 ForSale New Constructi… 2.90e7 38.9 -77.1 5 9 16250
## 4 458388 ComingSoon Coming Soon 1.20e6 39.0 -77.1 4 4 2089
## 5 344613… ForSale For Sale by Ag… 4.26e5 38.9 -77.0 3 2 4800
## 6 351580… ForSale For Sale by Ag… 3.25e5 38.9 -77.1 2 1 727
## # ℹ 4 more variables: zestimate <int>, rentZestimate <int>,
## # taxAssessedValue <int>, url <chr>
Loading the AirBnB data, which drops a few errors ($7000 hostel beds that should be $70):
##
## Attaching package: 'readr'
## The following object is masked from 'package:scales':
##
## col_factor
## # A tibble: 6 × 25
## listing_id latitude longitude hover_description listing_url price
## <dbl> <dbl> <dbl> <chr> <chr> <dbl>
## 1 3686 38.9 -77.0 Vita's Hideaway https://ww… 67
## 2 3943 38.9 -77.0 Historic Rowhouse Near Monume… https://ww… 82
## 3 4197 38.9 -77.0 Capitol Hill Bedroom walk to … https://ww… 135
## 4 4529 38.9 -76.9 Bertina's House Part One https://ww… 66
## 5 5589 38.9 -77.0 Cozy apt in Adams Morgan https://ww… 130.
## 6 178395 39.0 -77.0 Spare Room for Washington,DC … https://ww… 399
## # ℹ 19 more variables: property_type <chr>, room_type <chr>,
## # accommodates <dbl>, number_of_reviews <dbl>, number_of_reviews_ltm <dbl>,
## # number_of_reviews_l30d <dbl>, first_review <date>, last_review <date>,
## # review_scores_rating <dbl>, reviews_per_month <dbl>, host_id <dbl>,
## # host_name <chr>, host_identity_verified <lgl>, host_listings_count <dbl>,
## # host_total_listings_count <dbl>, license <chr>, neighborhood <chr>,
## # minimum_nights <dbl>, shortNeighborhood <chr>
Assign neighborhoods to the Zillow data. This drops 6 rows lacking lat/long coordinates.
## # A tibble: 6 × 15
## zpid homeStatus marketingStatus price beds baths area zestimate
## <chr> <chr> <chr> <int> <int> <int> <int> <int>
## 1 12103562 ComingSoon Coming Soon 485000 2 2 1086 501400
## 2 425748 ForSale For Sale by Agent 1150000 3 4 2540 1139800
## 3 435451 ForSale New Construction 28995000 5 9 16250 NA
## 4 458388 ComingSoon Coming Soon 1195000 4 4 2089 1199000
## 5 344613704 ForSale For Sale by Agent 425900 3 2 4800 403100
## 6 351580057 ForSale For Sale by Agent 325000 2 1 727 NA
## # ℹ 7 more variables: rentZestimate <int>, taxAssessedValue <int>, url <chr>,
## # neighborhood <chr>, shortNeighborhood <chr>, latitude <dbl>,
## # longitude <dbl>
Verify that neighborhoods have been assigned correctly:
The black plots are in DC, Red in Virginia and Maryland.
Neighborhood counts:
## # A tibble: 40 × 2
## shortNeighborhood n
## <chr> <int>
## 1 "Unknown" 2624
## 2 "NW-mid Brightwood Park, C" 210
## 3 "NE Ivy City, Arboretum, T" 179
## 4 "NW-mid Columbia Heights, " 179
## 5 "NE/NW Edgewood, Bloomingd" 165
## 6 "NE Union Station, Stanton" 140
## 7 "SW Southwest Employment A" 113
## 8 "NW-mid Downtown, Chinatow" 91
## 9 "NW-mid Dupont Circle, Con" 91
## 10 "SE Capitol Hill, Lincoln " 89
## # ℹ 30 more rows
Drop unknown neighborhoods and see if neighborhoods coded correctly:
And the AirBnB’s:
## # A tibble: 40 × 5
## shortNeighborhood avg_price_zillow avg_price_airbnb change_zillow
## <chr> <chr> <chr> <chr>
## 1 "NE Brookland, Brentwood, " $875,089.21 $135.76 -12.96%
## 2 "NE Deanwood, Burrville, G" $450,288.53 $125.89 -55.21%
## 3 "NE Eastland Gardens, Keni" $424,333.33 $91.00 -57.80%
## 4 "NE Ivy City, Arboretum, T" $656,705.80 $148.19 -34.68%
## 5 "NE Mayfair, Hillbrook, Ma" $451,578.81 $135.47 -55.09%
## 6 "NE North Michigan Park, M" $762,030.60 $108.62 -24.21%
## 7 "NE Union Station, Stanton" $999,628.07 $173.22 -0.58%
## 8 "NE Woodridge, Fort Lincol" $700,391.23 $178.15 -30.34%
## 9 "NE/NW Edgewood, Bloomingd" $826,102.61 $129.07 -17.83%
## 10 "NE/NW Lamont Riggs, Queen" $644,808.97 $101.35 -35.87%
## # ℹ 30 more rows
## # ℹ 1 more variable: change_airbnb <chr>
Create a scatterplot comparing percent change in Zillow versus percent change in Airbnb:
## `geom_smooth()` using formula = 'y ~ x'
It doesn’t look like much of a correlation. Let’s prove it. Test for fit:
##
## Call:
## lm(formula = avg_price_airbnb ~ avg_price_zillow, data = combined_avg_price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -68.612 -31.890 -4.078 19.389 156.171
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.540e+02 1.230e+01 12.518 4.69e-15 ***
## avg_price_zillow 1.323e-05 9.369e-06 1.412 0.166
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 49.27 on 38 degrees of freedom
## Multiple R-squared: 0.04985, Adjusted R-squared: 0.02485
## F-statistic: 1.994 on 1 and 38 DF, p-value: 0.1661
Conclusion:
AirBnB prices have little to do with Zillow prices (on the neighborhood level) - perhaps because the neighborhoods are not subdivided enough, perhaps because vacationers are looking for different things than home buyers?
## # A tibble: 39 × 2
## shortNeighborhood median_airbnb_price
## <chr> <dbl>
## 1 "NE Brookland, Brentwood, " 116.
## 2 "NE Deanwood, Burrville, G" 82.5
## 3 "NE Eastland Gardens, Keni" 68
## 4 "NE Ivy City, Arboretum, T" 129
## 5 "NE Mayfair, Hillbrook, Ma" 81
## 6 "NE North Michigan Park, M" 98
## 7 "NE Union Station, Stanton" 150
## 8 "NE Woodridge, Fort Lincol" 131
## 9 "NE/NW Edgewood, Bloomingd" 120
## 10 "NE/NW Lamont Riggs, Queen" 92
## # ℹ 29 more rows
## # A tibble: 39 × 2
## shortNeighborhood median_zillow_price
## <chr> <dbl>
## 1 "NE Brookland, Brentwood, " 764450
## 2 "NE Deanwood, Burrville, G" 420000
## 3 "NE Eastland Gardens, Keni" 475000
## 4 "NE Ivy City, Arboretum, T" 599000
## 5 "NE Mayfair, Hillbrook, Ma" 450000
## 6 "NE North Michigan Park, M" 649950.
## 7 "NE Union Station, Stanton" 850000
## 8 "NE Woodridge, Fort Lincol" 687475
## 9 "NE/NW Edgewood, Bloomingd" 739000
## 10 "NE/NW Lamont Riggs, Queen" 634950
## # ℹ 29 more rows
Heatmap of Price by Neighborhood:
Let’s estimate the Return on Investment, if hypothetically buying a house to rent out as an AirBnB, using reviews per month. Note that this is extremely crude and assumption laden. Neighborhoods are large blocks and contain diversity within them, no granular detail included on type of rental (whole property? shared room?), and an off-the-cuff two nights per review (a possible way to refine this estimate would be based on future availability, itself an assumption-laden approach).
## # A tibble: 39 × 7
## shortNeighborhood roi avg_airbnb_price avg_reviews median_zillow_price
## <chr> <chr> <chr> <dbl> <chr>
## 1 SE Fairfax Village, N… 4.09… $161.93 1.87 $177,500.00
## 2 NW-far Cathedral Heig… 3.69… $211.92 2.67 $368,000.00
## 3 SW Southwest Employme… 2.78… $323.01 2.01 $559,900.00
## 4 SE Near Southeast, Na… 2.37… $262.78 2.28 $604,950.00
## 5 NW-far North Clevelan… 2.27… $192.24 1.91 $387,000.00
## 6 SE Woodland/Fort Stan… 2.26… $164.40 1.71 $299,000.00
## 7 NW-mid Downtown, Chin… 2.25… $300.02 1.53 $489,900.00
## 8 SE Twining, Fairlawn,… 2.17… $154.95 2.51 $430,000.00
## 9 NW-mid West End, Fogg… 2.12… $189.69 3.03 $649,000.00
## 10 NW-mid Dupont Circle,… 1.81… $207.48 2.30 $629,000.00
## # ℹ 29 more rows
## # ℹ 2 more variables: median_z_scaled <dbl[,1]>, estimated_revenue <chr>
ROI per neighborhood distribution:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00182 0.00750 0.01226 0.01392 0.01791 0.04094
## Zoom: 12
## Zoom: 12